Fix ESP32 RSA watchdog timeout during key generation by valientegaston · Pull Request #3018 · toitlang/toit

valientegaston · 2026-04-06T03:51:25Z

Problem
On ESP32, generating an RSA 2048-bit key takes longer than 10 seconds, causing the watchdog timer to kill the process, mistakenly believing it is hung.

Root cause
The mbedtls RSA operations (rsa_generate, rsa_sign, rsa_encrypt, rsa_decrypt) were passing NULL as the context to the rsa_rng callback. The scheduler watchdog monitors the run_timestamp of each process and kills it if it hasn't been updated within the timeout window. Since the RSA key generation is a long-running blocking operation, the timestamp was never updated during its execution.

Fix
The rsa_rng callback is called many times by mbedtls during RSA operations — in particular during key generation, it is called repeatedly to generate prime candidates and random bases for Miller-Rabin primality tests. This makes it a natural and low-overhead place to reset the watchdog by updating the process run_timestamp.
The fix passes the Process* as the context to all rsa_rng calls, and inside rsa_rng the scheduler mutex is taken to safely update the timestamp.

Safety
We verified that primitives execute inside an Unlocker block in Scheduler::run_process, meaning the scheduler mutex is not held during RSA operations. Therefore, taking it inside rsa_rng is safe and will not cause a deadlock.

valientegaston · 2026-04-06T04:27:42Z

I also noticed that CONFIG_MBEDTLS_HARDWARE_MPI is explicitly set to n across all ESP32 toolchain configurations (esp32, esp32s2, esp32s3, esp32c3, esp32c6). Was this intentional? I understand it wouldn't help with key generation (as the prime generation and Miller-Rabin tests run in software regardless), but it could potentially speed up everyday RSA operations like signing, verification, and encryption/decryption.

floitsch · 2026-04-06T10:21:20Z

Primitives shouldn't be blocking. We want to run other tasks when waiting for a primitive. As such, the scheduler revealed a real issue.
I think the best way is to run the key generation in a separate thread.
Have a look at src/event_sources/async_posix.cc and src/resources/spi_linux.cc (which uses it). With a bit of luck the same approach works on the ESP32.
It would be a bit more complicated:

create a "RsaGeneration" resource that contains the thread, and then do the creation of the key using that resource.

I'm guessing we set the MBEDTLS_HARDWARE_MPI to 'n' to reduce the image size. We can definitely enable it again.
I don't remember anymore if the RSA-key generation is behind an sdkconfig flag or not. I think we already have a TOIT_CRYPTO_EXTRA flag, though. So that's probably enough. The new resource should definitely be behind that one.

… MPI support across ESP32 targets.

valientegaston · 2026-04-06T21:22:09Z

Hi! I have implemented the RSA key generation following your suggestions:

Non-blocking: The primitives are now non-blocking. The key generation runs in a separate thread, and the Toit Task is suspended while waiting for the resource event.

RsaGeneration Resource: I created a new RsaGeneration resource (contained in StructTag as RsaGenerationResourceTag and RsaGenerationResourceGroupTag) to manage the lifecycle of the asynchronous task.

Architecture: I followed the approach in async_posix.cc, using the AsyncEventThread to coordinate the background generation and the completion event.

Flags & Configuration: The new resource and primitives are correctly guarded by the CONFIG_TOIT_CRYPTO_EXTRA sdkconfig flag as you proposed. I've also updated the sdkconfig.defaults for ESP32 targets to enable this by default.

Binary Compatibility: I made sure to append the new tags at the end of the StructTag enum to avoid any issues with existing bootstrap snapshots.

Please let me know if there's any part of the implementation you'd like me to refine further. Thanks!

- Add RsaGenerationResource_ class with finalizer/close pattern to ensure proper cleanup of resource group and state on exceptions (e.g. with-timeout). - Add rsa-generate-close primitive to tear down the resource group. - Fix dangling proxy in rsa_generate_finish error paths by adding missing clear_external_address calls. - Add no-op Thread::cancel() for ESP32 (required since async_posix is now compiled for ESP32).

floitsch · 2026-04-07T20:22:50Z

Opened a fix PR on your fork: valientegaston#1

It addresses a few resource cleanup issues:

The resource group and state were not cleaned up on exceptions (e.g. with-timeout wrapping the generate call). Added a RsaGenerationResource_ class with the standard finalizer/close pattern.
Missing resource_proxy->clear_external_address() on error paths in rsa_generate_finish (dangling pointer).
Missing Thread::cancel() on ESP32 (linker error since async_posix is now compiled for ESP32).

Fix RSA resource cleanup and ESP32 Thread::cancel

floitsch · 2026-04-08T19:21:02Z

+          free(prv); free(pub);
+        } else {
+          // mbedtls writes from the end. Move to start.
+          unsigned char* prv_start = (unsigned char*)malloc(prv_len);


If we don't have enough memory here, we would have to throw away everything we did so far and start again.
Also: I'm not even sure things would behave correctly: By marking the error as MBEDTLS_ERR_PK_ALLOC_FAILED the rsa_generate_finish would return a "malloc failed". This would indicate to the system that the primitive ran out of memory and that we should do a GC, then try again. However, the rsa_generate_finish would just fail now as the resource_proxy just has been cleared.

-> We need to allocate the memory in the main-thread (that has called the primitive) and not inside the event-thread. Unfortunately, this means that we need to know the prv_len und pub_len already before we actually call mbedtls_pk_write_key_der.

That said: it looks to me like a resize is the better option anyway. The prv_len/pub_len just seem to say how much of the given buffer was written too.

Replaced the two extra malloc calls with memmove on the same buffer to shift the data to the start, followed by realloc to shrink it to the exact size. If realloc fails it returns null without altering the original block, so the original pointer is preserved and passed to set_results correctly — no memory leak or dangling pointer.

The same still applies to the 'malloc' in line 986/987. (the ret goes to the rsa_generate_finish, leading to an unrecoverable OOM). The mallocs need to be done in the main thread. Here we just use constants (RSA_PRV_DER_MAX_BYTES), so that's relatively easy to fix.

floitsch · 2026-04-08T19:22:05Z

  bool is_locked() const { return OS::is_locked(mutex_); }
  bool is_boot_process(Process* process) const { return boot_process_ == process; }

+  Mutex* mutex() const { return mutex_; }


Necessary?
If yes, we should try to find a better approach than to make the scheduler's mutex public.

No longer necessary. In the previous approach, the rsa_rng callback needed to take the scheduler mutex to update the process run_timestamp. With the new async design, the Toit process is suspended during key generation, so there's no need to update the timestamp from the RNG callback at all. The mutex() accessor has been removed.

floitsch · 2026-04-08T19:23:21Z

 CONFIG_MBEDTLS_X509_TRUSTED_CERT_CALLBACK=y
 CONFIG_MBEDTLS_ECP_RESTARTABLE=y
-CONFIG_MBEDTLS_HARDWARE_MPI=n
+CONFIG_MBEDTLS_HARDWARE_MPI=y


Is there any downside to enabling MPI?

For some esp32 variants we were fighting the partition sizes already. So if this increases the image size, we need to make sure we aren't too big now.

I measured the binary size impact on ESP32:

Without HARDWARE_MPI: 0x1561f0 bytes (18% free of 0x1a0000 partition)

With HARDWARE_MPI: 0x156390 bytes (18% free of 0x1a0000 partition)

The difference is only ~416 bytes.

I only tested on ESP32 so far — still checking the other variants.

ESP32S3:

Without HARDWARE_MPI: 0x14d5c0 bytes (20% free)
With HARDWARE_MPI: 0x14d840 bytes (20% free)
Difference: ~640 bytes

Still checking the remaining variants (esp32c3, esp32c6, esp32s2).

ESP32S2:

Without HARDWARE_MPI: 0x116a50 bytes (33% free)
With HARDWARE_MPI: 0x116ce0 bytes (33% free)
Difference: ~656 bytes

ESP32C3:

Without HARDWARE_MPI: 0x163e00 bytes (14% free)
With HARDWARE_MPI: 0x164560 bytes (14% free)
Difference: ~1888 bytes

ESP32C6:

Without HARDWARE_MPI: 0x1818c0 bytes (11% free)
With HARDWARE_MPI: 0x182020 bytes (11% free)
Difference: ~1888 bytes

floitsch · 2026-04-15T21:31:42Z

@valientegaston do you think you can address the feedback?

valientegaston · 2026-04-17T03:34:14Z

Hi! Yes, I'll be addressing the feedback in the coming days. We've been quite busy lately but I'll make sure to go through all the comments and make the necessary changes. Thanks for your patience!

…d remove unused scheduler mutex accessor

floitsch

Still some subtle issues.
Not completely sure what the best way for the resource handling is.
Maybe one way is to keep track of whether the parallel thread is running and store that in the resource. When the resource gets the signal to destroy itself, it could then delay that action until the thread returns. Alternatively (if possible) killing the thread first, would also work.
We do something slightly similar in the BLE code.

floitsch · 2026-04-20T18:55:10Z

+    }
+
+    if (ret == 0) {
+      unsigned char* prv = (unsigned char*)malloc(RSA_PRV_DER_MAX_BYTES);


When do we free the memory again? Shouldn't that happen in rsa_generate_finish?
But that's more complicated:

what if nothing ever calls 'rsa_generate_finish'? This means that the memory needs to be attached to the resource.

However, that immediately reveals another race condition (which was already present): what if the program stops while we are running in parallel. The res resource could already be freed at that point.

floitsch · 2026-04-20T18:56:17Z

+          free(prv); free(pub);
+        } else {
+          // mbedtls writes from the end. Move to start.
+          unsigned char* prv_start = (unsigned char*)malloc(prv_len);


The same still applies to the 'malloc' in line 986/987. (the ret goes to the rsa_generate_finish, leading to an unrecoverable OOM). The mallocs need to be done in the main thread. Here we just use constants (RSA_PRV_DER_MAX_BYTES), so that's relatively easy to fix.

floitsch · 2026-04-20T19:01:35Z

+          memmove(prv, prv + RSA_PRV_DER_MAX_BYTES - prv_len, prv_len);
+          memmove(pub, pub + RSA_PUB_DER_MAX_BYTES - pub_len, pub_len);
+
+          unsigned char* prv_resized = (unsigned char*)realloc(prv, prv_len);


Maybe add the following information:
RSA_PRV_DER_MAX_BYTES is typically ~550 bytes.
Typically, keys only need ~300 bytes.
As such the realloc is useful.
However, if it fails, it's not the end of the world either.

floitsch · 2026-04-20T19:01:58Z

+    }
+
+    if (ret == 0) {
+      unsigned char* prv = (unsigned char*)malloc(RSA_PRV_DER_MAX_BYTES);


Use unvoid_cast instead of C-style casts. Here and in all the other places.

floitsch · 2026-04-20T19:04:08Z

+  }
+
+  ByteArray* prv_der = process->allocate_byte_array(resource->prv_len());
+  ByteArray* pub_der = process->allocate_byte_array(resource->pub_len());


Another option is to use an external ByteArray and use the existing buffers. That would decrease the peak amount of memory.
I think for 250+ bytes it's ok to have an external byte-array.
However, at that point I would do another realloc, just in case the one in the other thread didn't succeed. If it still doesn't succeed, you can do a MALLOC_FAILED.

floitsch · 2026-04-20T19:04:53Z

+    resource->resource_group()->unregister_resource(resource);
+    resource_proxy->clear_external_address();


That's a bit too eager.
For OOMs the primitive will do a GC and then try again. If you destroy the resource the second attempt wouldn't work anymore.

Fix RSA 2048-bit deadlock on ESP32 by resetting watchdog in RNG callback

72bb36b

feat: add asynchronous RSA key generation and enable mbedTLS hardware…

07b3b8e

… MPI support across ESP32 targets.

floitsch mentioned this pull request Apr 7, 2026

Fix RSA resource cleanup and ESP32 Thread::cancel valientegaston/toit#1

Merged

floitsch and others added 2 commits April 7, 2026 22:25

Add trailing newline to rsa.toit

f6e6e8f

Merge pull request #1 from toitlang/rsa-resource-cleanup

0452296

Fix RSA resource cleanup and ESP32 Thread::cancel

floitsch reviewed Apr 8, 2026

View reviewed changes

Enable hardware MPI for ESP32, optimize RSA key memory allocation, an…

870bbbc

…d remove unused scheduler mutex accessor

floitsch reviewed Apr 20, 2026

View reviewed changes

		resource->resource_group()->unregister_resource(resource);
		resource_proxy->clear_external_address();

Conversation

valientegaston commented Apr 6, 2026

Uh oh!

valientegaston commented Apr 6, 2026

Uh oh!

floitsch commented Apr 6, 2026

Uh oh!

valientegaston commented Apr 6, 2026

Uh oh!

floitsch commented Apr 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valientegaston Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

floitsch commented Apr 15, 2026

Uh oh!

valientegaston commented Apr 17, 2026

Uh oh!

floitsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

valientegaston Apr 18, 2026 •

edited

Loading